C$225 
Data Structures 
Notes Packet #8 
Algorithms and 
Abstract Data Types 


Jason Zych 


©1997, 1998, 1999 Jason Zych 


Chapter 1 


Algorithms and Abstract Data Types 


1.1 Definition of an algorithm 


An algorithm is a computational method for solving a problem. 
Implicit in that definition are three key points: 


1. An algorithm is a computational method. 
2. An algorithm is a solution, presumably a correct one. 
3. An algorithm solves a problem. 


We will examine each of these points in detail. 


1.1.1 Computability 


The word “computational” is very important. Why? Well, for example, consider a program 
that possibly has an infinite loop. It is your task to decide if there is indeed an infinite loop in 
the program or not. How do you do this? 

Well, you might begin by simply looking at the code, seeing if any “mistakes” jump out at 
you — for example, a for-loop that has an upper bound but counts downward. This idea sounds 
plausible enough, but...what exactly is guiding you here? You certainly can put together a list 
of obvious things to look for (the miswritten for-loop, for example), but for the most part, you 
are relying on your intuition of what you “shouldn’t see” in a program. There may be a problem 
that jumps out at you but was not on your list of things to look for. 

Perhaps, though, you have a perfect list of things to look for — all of the obvious problems 
that signify an infinite loop. Certainly, there are programs with infinite loops that wouldn’t be 
caught by that list. For example, perhaps you are in a while-loop, depending on the results 
of some repeated calculation in order to exit the loop. You have no way of knowing, just by 
looking at the code, whether or not that computation will ever be what you need to exit the 
loop. You would have to start tracing through the code to see what values are produced by the 
computation. So, you might attempt to trace through the code, until you either successfully 
leave the loop, or else can “tell” that you are in an infinite loop. But, how do you “tell”? Maybe 
you encounter the same value 3 times in a row for your calculation, when you know that the 
value should be increasing. Certainly, if you knew the value should be increasing, this would be 
a big hint that something is wrong. But, what if you didn’t know that? Or, worse yet, what if 


3 


you had a program in which this wasn’t the case? A loop that depends on a certain value being 
true to continue will always be checking to see if a certain value is 1 or 0. That value may be 1 
many times in a row, and will be 0 only on the final check. But, that doesn’t mean you are in an 
infinite loop. You may still be in an infinite loop here, but for completely different reasons. So, 
“look for a value that stays the same 3 consecutive times” is a rule that may reveal some infinite 
loops, but not all, and may falsely identify as infinite loops some loops that are not infinite. It 
is not a correct catch-all rule. It may work sometimes, but not always. 

In fact, try as you might, you wil never be able to come up with a rule — or even a large 
set of rules — that will collectively determine if a program has an infinite loop for any possible 
choice of input program. You might think this assertion is ridiculous — after all, if it were true, 
how could you possibly debug? You detect infinite loops all the time, right? Well, you do, but 
you don’t do it using only a set of rules. You use your own human intuition, you use your own 
knowledge about that particular program, and you use your own knowledge about what spot 
the program seemed to have reached the last time you ran it. You are doing far more than just 
running your code through a general step by step checklist of rules or steps in order to detect 
the loop. 

In other words, your intuition and experience can help you find infinite loops, but there is no 
set, general, step-by-step procedure for doing so. (We will discuss this again a little later in the 
semester.) A computer cannot use intuition or experience, though — it must have a step-by-step 
procedure to follow. So, if no such step-by-step procedure exists, the problem cannot be solved 
by a computer. 

An algorithm, then, cannot use intuition. A more expanded defintion of “algorithm” might 
be “a solution that you could actually code into a computer and run to solve the problem”. 
Intuition and experience are not “computational” and hence cannot be used in the description 
of algorithms. 


1.1.2 Correctness 


We are going to rely on algorithms to solve problems for us. As a result, it is necessary that 
whatever algorithm we choose correctly does what we need it to do. For example, consider an 
algorithm that adds two integers and returns the sum. If this algorithm only returns the correct 
sum occasionally, then it is useless! The results of this calculation are important, and could be 
used in anything from a business spreadsheet to a scientific calculation to an air-traffic control 
system. Our applications can only make correct decisions if they have correct data to work with. 
So, it is necessary that any algorithm we write be correct, for any possible input data, and at 
any possible time. 


1.1.3 Generality 


Our algorithm is designed to solve a problem. In most cases, the problems we are trying to solve 
are of a general nature, and so we generally want an algorithm to work on any legal data, and 
thus define it that way. Certainly, if we knew program A had an infinite loop, we could code an 
algorithm to take program A as input and return “yes” as output. For example, the following 
program: 


#include <iostream.h> 


int main() 

{ 
cout << ‘‘yes’?’; 
return 0; 


will do just that. Whenever we pass program A to this algorithm, it will correctly tell us that 
program A has an infinite loop. However, this algorithm is completely unreliable for other 
input. It will return “yes” for every program, even those that don’t have infinite loops. So, such 
an algorithm is useless. We would want an algorithm to work for all legal data, not just one 
particular piece of input. So, when we refer to “solving a problem”, we mean “a problem” in 
the general sense, not for a particular piece of data. 


1.2. Algorithm Speed 


Among other things, we are concerned with how fast an algorithm runs. Certainly, if we were 
faced with two algorithms to solve a problem, and one worked faster than the other, we’d want 
to use the faster one, right? Well, it depends. (You will find that the answer to many questions 
we will encounter is “it depends”.) Faster when? For what data? How do we even measure 
algorithm speed? 

As a first attempt to measure algorithm speed, we could take an algorithm and some data, 
and run it on a machine (call it machine A): 





Algorithm 1 


11.8 seconds 




















Machine A 





This number is useless, of course, without something to compare it to. So, let’s compare it 
to the performance of a second algorithm for this particular problem on the same machine. 





Algorithm 1 


11.8 seconds 


Algorithm 2 
23.6 seconds 








Machine A 




















At first glance, you might think that we can safely claim that algorithm 1 is over 10 seconds 
faster than algorithm 2. However, our above data is very specific. For example, it only represents 
a single machine. Let’s try the same two algorithms on a different machine. 





Algorithm 1 


Algorithm 2 








Machine A 


11.8 seconds 


23.6 seconds 





Machine B 


5.9 seconds 


11.8 seconds 




















On machine B, the difference between the two algorithms is less than 5 seconds. On a third 
machine, the difference between the two algorithms could be entirely different. So, perhaps we 
shouldn’t measure the algorithm running times strictly in seconds — such a measurement is too 


machine-dependent. It appears, though that it would be safe to claim that Algorithm 1 is “twice 
as fast” as Algorithm 2, and that the statement would be machine independent. 

However, we still have too many restrictions. The above times are for a single run of the 
algorithm, on a single set of data. Larger or smaller sets of data may produce completely 
different running times. For example, on Machine A, our original data set that produced our 
original results was of size 10. Let’s also run the algorithm on Machine A on a data set of size 
20: 





Data set size 


Algorithm 1 


Algorithm 2 








10 elements 


11.8 seconds 


23.6 seconds 








20 elements 








47.2 seconds 





47.2 seconds 








When we have 20 elements, the running times are equal! Perhaps we should examine the 
results of even larger data set sizes. 





Data set size 


Algorithm 1 


Algorithm 2 








10 elements 


11.8 seconds 


23.6 seconds 





20 elements 


47.2 seconds 


47.2 seconds 





30 elements 


106.2 seconds 


70.8 seconds 








40 elements 








188.8 seconds 





94.4 seconds 








As we add more elements, the running time 
than the running time of Algorithm 2. 
So, armed with this information, we might say that for all data sets except for very small data 
sets, Algorithm 2 is faster than Algorithm 1. But, how much faster? Consider the following 
formulas: 
Alg1Time(n) = .118n? 


Alg2Time(n) = 2.36n 


of Algorithm 1 increases much more rapidly 


If we were to graph these two equations, we would get: 


Alg. 1: g(n) = .118n? 


Alg. 2: f(n) = 2.36n 


47.2 


20 


Notice that as n gets larger, the running times of Algorithm 1 and Algorithm 2 get further and 
further apart. That is the effect we ultimately want to describe. We want to discuss the order 
of growth of an algorithm — how fast its running time increases as we increase the size of the 
data set. If Algorithm 1 has a larger order of growth than Algorithm 2, then we expect that 


6 


after some data size has been exceeded, Algorithm 2 will be faster than Algorithm 1 for any 
larger data size, because Algorithm 1’s time requirements grow so much faster than those of 
Algorithm 2 that eventually Algorithm 1 will pass up Algorithm 2, and from then on, Algorithm 
2 is faster. 


1.3. Big-O Notation 


Big-O notation, or order notation, is a general way of describing algorithm resource requirements 
— usually time, but on occasion memory as well. Formally, we will say 


f(n) € O(g(n)) if there exist constants c and no such that 
f(n) < cx g(n) for all n > no. 


The ng constant simply refers to the fact that we are not concerned with what goes on with 
small values of n, but rather values of n after a certain point, i.e. progressively larger values of 
n. 


Example 1: f(n) =n, g(n) = n? 


g(n) 


f(n) 


n= 1 


Figure 1.1: 


If we choose np = 1 and c = 1, we see that f(n) < 1% g(n) for alln > 1. So f(n) € O(g(n)) 
We don’t care what goes on to the left of n = 1 (i.e. n = 79), only what goes on to the right of 
n = 1. So, since the equation is satisfied by selecting c= ng = 1, n € O(n?). 

Generally, if f(n) € O(g(n)), we will say that f(n) = O(g(n)), so here, n = O(n”). The truer 
description is that the big-O of a function is a set, but saying f(n) = O(g(n)) is a conceptual 
and notational convenience that we prefer to use. 


Example 2: f(n) = 2n, g(n) = 3n4+ 2 







g(n) = 3n+2 
f(n) = 2n 


Figure 1.2: 


Choose c = 1, nop = 0. 2n < 3n4 2 for n > ng. So, 2n = O(3n + 2) 


Conceptually, O() is an “upper bound”. We are saying that n grows no faster than n?, or 
that 2n grows no faster than 3n + 2. 


Example 3: f(n) = 3n + 2, g(n) = 2n 
Is 3n +2 = O(2n)? Yes! Pick np = 2, c= 2. Thus, 3n + 2 < 2(2n) for n < 2. 


2* g(n) = 2*2n = 4n 








f(n) = 3n+2 


g(n) = 2n 


n=2 
0 


Figure 1.3: 


4n acts as an upper bound on 3n+ 2. That is how the constant c comes into play with big-O 
notation — to elimate the “constant” effect of a function. One effect of this is that functions of 
the same order are big-O of each other. All linear functions are O(n). All quadratic functions 
are O(n”). Because we can set c to any value we choose, we can cancel the constant terms, and 


so the constant on the highest degree term doesn’t matter when describing a function’s order of 
growth. 


Example 4: f(n) = 6n? + 5n? + 2n + 3, g(n) =n? 
f(n) < 100 * g(n) Therefore, f(n) = O(n). 


We have another operator that acts in the opposite manner — one that provides a measure 
of a lower bound. 


f(n) € Q(g(n)) if there exist constants c and no such that 
f(n) > cx g(n) for all n > no. 


n? = Q(n), because n? is bounded from below by n. Just as with O(), functions of the same 
order of growth are (.() of each other. For example, 3n + 2 = Q(n). 


Since, as we have discussed, functions of the same order (for example, two linear functions) 
are both O() and Q() of each other, we provide a notation to indicate such a “tight” bound: 


f(n) € O(g(n)) if f(n) = O(g(n)) and f(r) = Q(g(n)). 


Examples: 


n # O(n?) 
n? 4 O(n) 
3n +2 = O(n). 


0() indicates a tight bound. If a function f() is O() of a function g(), then f() is necessarily 
O(g()). However, the reverse is not true; it is possible that f() = O(g()) but f() 4 O(g()). In 
other words, it is possible that f() is bounded by g(), but not bounded tightly by g(). If that is 
the case, then we can use the notation o() (called little-o) to describe this condition. 


f(n) € o(g(n)) if for each constant c > 0 there exists an ng > 0 such that 
f(n) < cx g(n) for all n > no. 


Examples: 


3n + 2 4 o(n) 
n? # o(n) 


n = 0(n?) 


Big-O is used very frequently; Theta and Omega notation are used less often, and little-o 
notation is rarely used. But, they do come up, so it is important to know what all four mean. 


1.4 Algorithm Analysis 


Algorithm analysis is the process of calculating the particular requirements of an algorithm 
— again, usually time requirements, but sometimes memory as well — by means of studying 
the algorithm line by line and working out the big-O representation of the total time of the 
algorithm. Since we are using big-O notation, we will be describing the algorithm’s performance 
for a general data set, that is, how much {time, memory} does this algorithm take for a data 
set of size n? 


We will examine a few basic analysis tools and methods here, and then develop these skills 
throughout the semester. 


Example 1: 


[1] for i=iton 


[2] Afi] =i + 3; 
[3] for i=ntol 
[4] f{ 

[5] cout << A[il; 
[6] cout << endl; 
iv eee 2 


How long does the above algorithm take to run? Certainly, it depends on the value of n, 
and so we will express the running time in terms of n. An array access or an array write takes 
only a constant amount of time; it doesn’t matter how many array operations we are doing, any 
single array access is still only O(1). So, line 2 is O(1). Then, the first for-loop results in line 2 
executing n times. If we perform an O(1) operation n times, the resultant time should be O(n). 
(Think of this as O(n * 1).) So, the first for-loop, overall, should run in O(n) time. As n grows, 
the time for this for-loop will grow proportionately. 

Likewise, lines 5 and 6 are each O(1) operations — neither depends on the size of n. So, 
lines 5 and 6 together will also take O(1) time, because O(1) + O(1) = O(1). Since the second 
for-loop also runs the operations inside it n times, and since those operations inside the for-loop 
are constant time, the second for-loop will take O(n) time, just like the first for-loop. 

So, there are two for-loops, each taking O(n) time. O(n) + O(n) = O(n), so the entire 
algorithm takes O(n) time overall. 

Actually, we can go so far as to say this algorithm takes O(n) time overall, since we know 
that not only does the algorithm take time at most proportional to n, but that it takes time 
exactly proportional to n. Remember that even a constant function is “technically” O(n), but 
only linear functions are O(n). However, people will often simply use O() to convey information, 
and not really concern themselves with whether the statement can in fact be “strengthened” to 
0(). If you are being exact, then you should use the proper term, but it is usually okay to just 
informally use a phrase such as “this is an order-n algorithm” or “this is an order-n-squared 
algorithm” and people understand that either you are speaking about a tight bound, or else 
whether the bound is tight or not doesn’t really matter in that conversation. 


Example 2: 


[1] fori=i1iton 
[2] for j =1ton 
[3] cout << ‘‘Hello, world!’’ << endl; 


(You knew it was coming! :-) ) 


Line 3 is O(1), making the internal for-loop O(n), because n * O(1) = O(n). So, if the 
outer for-loop runs the inner for-loop n times, and the inner for-loop is O(n), then the outer 


10 


for-loop should run in time n * O(n), or O(n”). As n grows, the 3-line algorithm runs in time 
proportional to n?. 


Example 3: 


[1] for i=iton 
[2] foo(i); 


For the above algorithm, the running time depends on the running time of the function foo(). 
In the earlier examples, the running time was based on the running time of each individual step; 
likewise, here, the running time is also based on the running time of the individual steps. Earlier, 
however, we could very easily deduce the time an array access took, or the time a print statement 
took. Here, we have to figure out the time another function takes, and since we are not as familiar 
with this function as we are with array accesses and print statements, we cannot tell the running 
time as easily. 

If fo0(), with one integer as data, is an O(1) function, thenn the entire loop is O(n), just 
as in the earlier examples. What if foo(i) runs in time O(i) though? This could be the case if 
the function foo() was defined as follows: 


void foo(int i) 
{ 
for j = 1 toi 
cout << ‘‘Hello.’’ << endl; 
} 


In this case, foo() takes a different amount of time each step through the for-loop. On the 
first iteration of the loop, foo() will take O(1). On the second iteration of the loop, foo() will 
take O(2), which can still be thought of as O(1), but we will write O(2) for now. On the third 
iteration of the loop, foo() will take O(3) time. (Again, this is really O(1), but there’s a point 
to be made here.) And so on. 

Overall, the for-loop, and hence the algorithm, will take time equal to: 


O(1) + O(2) + O(3)+...+O(n-2) + O(n-1) + O(n) 


and this turns out to add to O(n?). 

We will develop analysis skills as we analyze the various algorithms of the data structures 
we explore throughout the semester. This was simply intended to give you a few basic “building 
block” ideas so that you will better understand what we are trying to do as we analyze actual 
data strcutures. 

One final note : we will often wish to perform two different types of analysis: 


e Worst Case - given an algorithm, and the particular data set of size n that makes it run 
the slowest of all data sets of size n, what is the running time? 


e Average Case - given an algorithm and a “typical” data set of size n, what is the running 
time? 


11 


1.5 Abstract Data Types 


As we study data structures, we will find it helpful to describe them in a form that goes beyond 
any particular implementation, and in a form that does not depend on any particular language. 
Our study of data structures will be focused on what are known as abstract data types, 
or ADTs. Abstraction is a way of thinking about a data structure that is independent of the 
language used and independent of any particular implementation of that data structure. In 
addition, abstract data types provide many other benefits. 

In C$125, you were given your first exposure to object-oriented programming, commonly 
abbreviated OOP. By programming with “objects”, we gain a great deal over the old methods 
of software design. Among the most notable of these benefits is that we have a stronger ability 
to actually model in our software the kinds of things we are dealing with in the real world. We 
can design classes to represent any number of real-world ideas, instead of having to break all our 
real-world ideas down into thing like ints and floats for the computer. Eventually, of course, 
we still need to break things down into the basic concepts that the computer can understand, 
but it is to our benefit to deal with high-level, real-world ideas for as long as possible, since 
we will be able to better model the problem and its solution if we actually think in terms of 
the problem and its solution, rather than thinking in terms of implementation constructs and 
language issues. 

Another huge benefit of OOP, though, is the ability to reuse software. There are certain 
ideas that occur frequently, in many different software projects, and it would be to our benefit 
to write the code to implement an idea only once, and then reuse that same code if the same 
idea is needed in a later project. After all, why “re-invent the wheel” each time we start a 
new project? We can go even further, though. There are certain ideas, certain data models in 
a software project, that are merely specific versions of a more general idea. We see the same 
general ideas occur again and again in different programs, with the only difference between them 
being that these very general ideas are used for different specific purposes in different programs. 
We gain the most flexibility and reusability if we try and model these more general ideas, and 
then adapt the ideas for their specific uses in each of our various projects. 

This is the idea of an abstract data type. An abstract data types (ADT) can be defined as: 


1. aset of abstract data, and 
2. a set of operations on that data 


What the data actually is — coordinates, user accounts, processes, or whatever — is not 
important. We view it simply as “data” or “components”. Likewise, how the operations are 
performed and how the values are actually stored is not important. Neither the specific meaning 
of the data, nor the implementation of the data storage and data manipulations, are part of an 
ADT specification. We simply want to provide a conceptual, abstract sense of what our data 
values are and how they are arranged, and a conceptual sense of what the operations do to the 
data. Then, the programmer can be responsible for implementing the particulars of the storage 
of values, or the particulars of how the operations are performed, and later, the particulars of 
what this data actually zs. 


12 


Why is this technique worthwhile? It is worthwhile because ADTs offer two important 
features: 


1. Abstraction 
2. Encapsulation 


We will now further discuss each of these two ideas. 


1.5.1 Abstraction 


When we design data structures, we want to design them to cover the general, rather than the 
specific case, because this will allow for the maximum possible code reuse. Because of this, we 
like to examine the real-world ideas we are trying to model to see if perhaps there are certain 
features of these ideas that are common to other applications as well. This attempt to generalize 
our structures is known as abstraction. 


Example 1: 


e A stack of papers lies on your desk. You can place a new paper on top, or take the top 
paper off and read it, but you can’t pull a paper from the middle without toppling the 
stack, and so you don’t want to do this. 


e A stack of cafeteria trays sits in a dorm lunch room. You may take a tray off the top; you 
may take one look at the dorm food and then put the tray back on the stack (ordering 
pizza afterward!). But, you can’t pull a tray from the middle of the stack of trays without 
toppling the stack, making lots of noise and embarassing yourself. So, you’d prefer not to 
do this. 


e As a computer executes function calls, it saves data for a particular function before calling 
the next one: 

















main 
foo2’s data 
fool(); saved before 
} we jump to 
foo3 
fool() 
{ fool’s data 
foo2(); saved before 
} - a to 
00 
foo2() ; 
{ main’s data 
saved before 
foo3Q); ue jump to 
ool 





A function call places new data on the stack; returning from a function results in data 
being removed from the top of a stack and restored. Restoring data from the middle of the 


13 


stack, however (for example, restoring the data for fo001() before we’ve finished returning 
from fo03() and foo2()), could result in data corruption at best, and program collapse 
at worst. So, this would be a bad thing to do. 


Although all of these ideas are quite different, they all have some things in common. You 
can only add or remove from the top of your collection of objects, and the rest of the structure 
should not be altered or even read. Those are the most general details involved in all three 
applications above. Whether we are dealing with papers or cafeteria trays or a block of memory 
to store function data is a detail that is application-specific. But the restrictions on exactly 
where you can add to or remove from your collection of objects are general across all of the 
applications. These restrictions correspond to an abstract data type known as a STACK, which 
can be used to model all three cases above, and also many more. 


Example 2: 


e A shipping transportation network, indicating cities and roads betweeen them, is shown 
below: 





e An underground water system may be used to get water to various households. A pipe 
may split into two at a certain point, only to have the two pipes converge together at a 
later point: 


e A group of computers can be connected together with Ethernet cables. Each computer 
has unique properties (at the very least, a unique name), and the specific cables may each 
have particular unique data as well (a unique length, for example): 


14 

































































All these applications have specific hubs, and some form of connection between hubs. Us- 
ing vertices and edges, you can model all three of these situations (and others as well) using 
the abstract data type structure known as a GRAPH. Now, the vertices and edges have spe- 
cific purposes in each application, but the ideas of vertices and edges are common to all three 
applications. 

That is the general idea of abstraction. We want to design our classes to cover as many 
similar cases as possible, and we do this by abstracting away the details of any particular 
implementation, and only dealing with the essential details inherent in the abstract structure. 


1.5.2 Encapsulation 


With encapsulation, the details of data storage and operation implmentation are hidden from 
the user. The user only needs to be aware of the abstract operations, or interface, allowed by 
the type. 

For example, consider the library desk at the U of I main stacks. At the main stacks, library 
users are not permitted to enter the stacks themselves. Users can only access books by asking 
at the desk. 

Imagine users have three available options, as seen in the given diagram: 


Request () ; \\ user asks if a certain book is available, 
\\ gets yes or no answer 


Retrieve(); \\ user asks to check out an available book from main stacks, 
\\ is given the book 


Return(); \\ user gives book to desk, 
\\ it is put back on shelf in stacks 


In the diagram, you can see that the library desk serves as the interface for the library. It 
is through the interface that the users can access the protected data (the book stacks) by way 
of the operations that the interface provides (Request(), Retrieve(), and Return()). Once 
the user “calls” an interface function, the library performs that operation using a particular 
implementation. The implementation that is chosen is irrelevant as far as the user is con- 
cerned; all the user cares about is that the interface functions work as claimed. As long as the 
implementation is designed so that it supports the interface, the details of the implementation 
don’t matter to the user. 


15 











Users 


Request() 
k Retrieve() 


Return() - 
Library Pages 


j 


Implementation Book 
=— —> Stacks 























We can imagine how this library system may have been implemented before computers came 
along. The library pages (the workers at the desk) would have to do all the work! 


IMPLEMENTATION #1: 


Request() : page runs to stacks and checks for book, 
returns and tells user whether or not book is there 


Retrieve() : page runs to stacks and gets book, gives it to user 


Return() : page receieves book from user, runs to stacks and 
places book back on shelf 


The user, however, doesn’t need to know that the page is running around back there, or how 
the books are organized, or even that there are stacks. The user only needs to know that there 
are books somewhere (abstract representation of data) and that the three interface functions 
(abstract operations) help the user access these books. 

Now, imagine computers come to the library, lightening the workload on the pages: 


IMPLEMENTATION #2: 


Request() : page looks up book in computer to get information, instead 
of running back to stacks 


Retrieve() : same as before, but page must also note in computer 
that book is gone 


16 


Return() : same as before, but page must also note in computer 
that book is back 


Now, the page doesn’t have to do as much running back and forth! The user, though, doesn’t 
need to know that. The user also doesn’t need to learn the new computer system. In fact, the 
user doesn’t even need to know that anything has changed!. As long as Request (), Retrieve(), 
and Return() still work as the interface describes, the library operates the same way, from 
the user’s point of view, and so the user does’t need to be made aware of any changes in the 
implementation. 

In the future, the work of pages may become even easier, due to robots: 


IMPLEMENTATION #3: 


Request() : same as before 


Retrieve() : robot gets book for user, page notes in computer that 
book is gone 


Return() : robot takes book from user and returns it to shelf, page 
notes in computer that book is back 


Now, the page never has to leave the computer; the robot does all the travelling! However, even 
though Request(), Retrieve(), and Return() may have been re-implemented, they are still 
written to support the interface, and so, from the user’s point of view, nothing has changed. 

Your code should work this way as well. You write your code to support an abstract interface. 
Later, you can re-write the implementation (for example, to make use of a newly discovered, 
faster algorithm), but as long as you continue to properly support the same interface, no other 
code has to change, since it all works through the interface. 


1.5.3 Summary of Abstract Data Type idea 


So, essentially, an abstract data type is a way of expressing ideas about a collection of abstract 
data. We specify how we view this collection, conceptually, and we also specify what kind of 
operations we will allow ourselves to perform on this conceptual arrangement of data. But what 
we don’t care about are: 


1. what, exactly, this abstract data represents (abstraction means we view the data abstractly, 
instead of talking about components of a specific application and tie-ins to specific real- 
world ideas) 


2. how our operations are implemented (encapsulation means we care about what is being 
done to our abstract collection of data, but not how it is being done) 


If we design our types this way, we can use them over and over again in many different appli- 
cations. The implementations can be changed at will becase code that uses these types only 
uses them via the provided interface, and does not actually directly access the hidden imple- 
mentation. In addition, since the data type is coded from an abstract point of view, the user of 
the code needs to only fill in the specific component type — what kind of data is actually being 


17 


stored in the type. If that component type changes, or if a new project is begun that can make 
use of this same abstract idea but with a different component type, then this new component 
type can be filled in and the type works in the same way. 


1.6 Creating an ADT (from a specific type) 


In this example, we will generalize an array. 


Ordinary C++ array: 









































int myarray[10]; 
65S] 22] 9 4|3 |5 8 | 0 
0 1 2 3 4 6 7 8 9 


Now, the question becomes...what about this structure is too specific, i.e. what might pos- 
sibly be different in a different application? 


1. Upper bound - we have 10 elements here; in some other application, we may want 20, or 
30, or whatever other number. 


2. Element type - here we are using the array to hold integers; we may want to hold chars, 
or complex numbers, or chess piece objects. 


3. Lower bound - there’s no particular reason to start at 0. A junior high school may want 
to access students by grade: 6..8. A theater may have the frontmost sections numbered 
101 - 108. 


4. Fixed bounds - there also isn’t a reason to have set lower and upper boundss, we may 
want the abilily to change the bounds as we use the array 


So, we will use those ideas and create a general array ADT. First, we need an abstract 
description of our data — what the array itself 1s. To begin with, we will have two integers, 
upper and lower, with upper > lower - 1. The interval lower..upper will be called the index 
set. 

The array itself is a mapping from the index set to a value set. Given array X and an element 
i in the index set, the value X[i] is the element of X at index i. The type of this element is 
whatever type we are storing in our array. 

So, that is the abstract description of our array. To design an ADT, we need an abstract 
description of the data, and some operations on that data. What operations on the array might 
be helpful? Some potentially useful abstract operations are: 


18 


Create/Initialize : make new array with certain default values 

Access(i) : return X[il] 

Assign(i, v) : set X[i] to v 

SetEqual : set one array equal to another 

ChangeArrayBounds(1, u) : 1 and u are new array bounds; copy any of old 
values whose indices fall within this range 


There are other possibilities as well. 

So, that becomes our general Array ADT, because we now have an abstract description of 
the data, and operations to work on that data. The final question is, how can we implement 
this ADT? One possibility is to use linked memory: 





Node: | index value ptr 

















X | Xti4) X[5]_ | X{[6] | XI7] 






































4 5 6 7 
4 | x4] vi 6 | X[6] 
5 | x{5] 7 | X{7] 
































Nodes don’t even need to be in order: 





















































Placing them out of order, though, would make access harder to implement. 
A better implementation would be to use a C++ array, and translate the index used by the 
user into an index used by our implementation. 





x!) «(X4)}) (X15) | X16) | xp7] 




















0 1 2 3 


19 


X({i] = data[i-4] // for i = 4 through 7 


Either way, the interface is supported. The operations will be faster if we use the second 
implementation, but by changing from the first to the second, no code that used the interface 
functions would have to be changed. 

One final step that will be vitally important for us this semester: we will use templates to 
abstract away the specifics of the type. In other words, we will not write: 


array of int 

array of char 

array of float 

array of (insert your own type here) 


Instead, write 
array of T 


where T is a generic type. We define our class with this generic type, and then when we want 
to use the class to allocate a specific object, we fill in the generic type T with a specific type. 


20 


